---
redirect_from:
  - /direction-of-joins
  - /many-to-many-relationship
  - /schema/fundamentals/joins
---

# Working with Joins

A join creates a relationship between two cubes in your Cube project. Cube
supports three [types of join relationships][ref-schema-ref-joins-relationship]
often found in SQL databases:

- `one_to_one`
- `one_to_many`
- `many_to_one`

To use an example, let's use two cubes, `customers` and `orders`:

<CodeTabs>

```javascript
cube(`customers`, {
  // ...

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true,
    },

    company: {
      sql: `company`,
      type: `string`,
    },
  },
});

cube(`orders`, {
  // ...

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true,
    },

    customer_id: {
      sql: `customer_id`,
      type: `number`,
    },
  },
});
```

```yaml
cubes:
  - name: customers
    # ...

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: company
        sql: company
        type: string

  - name: Orders
    # ...

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: customer_id
        sql: customer_id
        type: number
```

</CodeTabs>

We could add a join to the `customers` cube:

<CodeTabs>

```javascript
cube(`customers`, {
  // ...

  joins: {
    orders: {
      relationship: `one_to_many`,
      sql: `${CUBE}.id = ${orders.customer_id}`,
    },
  },
});
```

```yaml
cubes:
  - name: customers
    # ...

    joins:
      - name: orders
        relationship: one_to_many
        sql: "{CUBE}.id = {orders.customer_id}"
```

</CodeTabs>

The join above means a customer has many orders. Let's send the following JSON
query:

```json
{
  "dimensions": ["orders.status", "customers.company"],
  "measures": ["orders.count"],
  "timeDimensions": [
    {
      "dimension": "orders.created_at"
    }
  ],
  "order": { "customers.company": "asc" }
}
```

The query above will generate the following SQL:

```sql
SELECT
  "orders".status "orders__status",
  "customers".company "customers__company",
  count("orders".id) "orders__count"
FROM
  public.customers AS "customers"
  LEFT JOIN public.orders AS "orders"
    ON "customers".id = "orders".customer_id
GROUP BY 1, 2
ORDER BY 2 ASC
LIMIT 10000
```

However, if we have guest checkouts, that would mean we would have orders with
no matching customer. Looking back at the `one_to_many` relationship and its'
resulting SQL, any guest checkouts would be excluded from the results. To remedy
this, we'll remove the join from the `customers` cube and instead define a join
with a `many_to_one` relationship on the `orders` cube:

<CodeTabs>

```javascript
cube(`orders`, {
  // ...

  joins: {
    customers: {
      relationship: `many_to_one`,
      sql: `${CUBE}.customer_id = ${customers.id}`,
    },
  },
});
```

```yaml
cubes:
  - name: orders
    # ...

    joins:
      - name: customers
        relationship: many_to_one
        sql: "{CUBE}.customer_id = {customers.id}"
```

</CodeTabs>

In the above data model, our `orders` cube defines the relationship between
itself and the `customer` cube. The same JSON query now results in the following
SQL query:

```sql
SELECT
  "orders".status "orders__status",
  "customers".company "customers__company",
  count("orders".id) "orders__count"
FROM
  public.orders AS "orders"
  LEFT JOIN public.customers AS "customers"
    ON "orders".customer_id = "customers".id
GROUP BY 1, 2
ORDER BY 2 ASC
LIMIT 10000
```

As we can see, the base table in the query is `orders`, and `customers` is in
the `LEFT JOIN` clause; this means any orders without a customer will also be
retrieved.

<InfoBox>

In Cube, joins only need to be defined from one direction. In the above example,
we explicitly _removed_ the `one_to_many` relationship from the `customer` cube;
not doing so would cause the query to fail as Cube would be unable to determine
a valid join path. [Click here][self-join-direction] to learn more about how the
direction of joins affects query results.

</InfoBox>

## Many-to-many joins

A many-to-many relationship occurs when multiple records in a cube are
associated with multiple records in another cube.

For example, let's say we have two cubes, `topics` and `posts`, pointing to the
`topics` and `posts` tables in our database, respectively. A `post` can have
more than one `topic`, and a `topic` may have more than one `post`.

In a database, you would most likely have an associative table (also known as a
junction table or cross-reference table). In our example, this table name might
be `post_topics`.

<InfoBox>

You can [jump to this section][self-many-to-many-no-assoc-table] if you don't
have an associative table in your database.

</InfoBox>

The diagram below shows the tables `posts`, `topics`, `post_topics`, and their
relationships.

<Diagram
  alt="Many-to-Many Entity Diagram for posts, topics and post_topics"
  src="https://ucarecdn.com/61343995-dedc-40ae-9367-e21a645051ee/"
/>

In the same way the `post_topics` table was specifically created to handle this
association in the database, we need to create an associative cube
`post_topics`, and declare the relationships from it to `topics` cube and from
`posts` to `post_topics`.

<InfoBox>

The following example uses the `one_to_many` relationship on the `post_topics`
cube; this causes the direction of joins to be `posts → post_topics → topics`.
[Read more about direction of joins here][self-join-direction].

</InfoBox>

<CodeTabs>

```javascript
cube(`posts`, {
  sql_table: `posts`,

  joins: {
    post_topics: {
      relationship: `many_to_one`,
      sql: `${CUBE}.id = ${post_topics.post_id}`,
    },
  },
});

cube(`topics`, {
  sql_table: `topics`,

  dimensions: {
    post_id: {
      sql: `id`,
      type: `string`,
      primary_key: true,
    },
  },
});

cube(`post_topics`, {
  sql_table: `post_topics`,

  joins: {
    topic: {
      relationship: `one_to_many`,
      sql: `${CUBE}.topic_id = ${topics.id}`,
    },
  },

  dimensions: {
    post_id: {
      sql: `post_id`,
      type: `string`,
    },
  },
});
```

```yaml
cubes:
  - name: posts
    sql_table: posts

    joins:
      - name: post_topics
        relationship: many_to_one
        sql: "{CUBE}.id = {post_topics.post_id}"

  - name: topics
    sql_table: topics

    dimensions:
      - name: post_id
        sql: id
        type: string
        primary_key: true

  - name: post_topics
    sql_table: post_topics

    joins:
      - name: topic
        relationship: one_to_many
        sql: "{CUBE}.topic_id = {topics.id}"

    dimensions:
      - name: post_id
        sql: post_id
        type: string
```

</CodeTabs>

In scenarios where a table doesn't define a primary key, one can be generated
using SQL:

<CodeTabs>

```javascript
cube(`post_topics`, {
  // ...

  dimensions: {
    id: {
      sql: `CONCAT(${CUBE}.post_id, ${CUBE}.topic_id)`,
      type: `number`,
      primary_key: true,
    },
  },
});
```

```yaml
cubes:
  - name: post_topics
    # ...

    dimensions:
      - name: id
        sql: "CONCAT({CUBE}.post_id, {CUBE}.topic_id)"
        type: number
        primary_key: true
```

</CodeTabs>

### Using virtual associative cube

Sometimes there is no associative table in the database, when in reality, there
is a many-to-many relationship. In this case, the solution is to extract some
data from existing tables and create a virtual (not backed by a real table in
the database) associative cube.

Let’s consider the following example. We have tables `emails` and
`transactions`. The goal is to calculate the amount of transactions per
campaign. Both `emails` and `transactions` have a `campaign_id` column. We don’t
have a campaigns table, but data about campaigns is part of the `emails` table.

Let’s take a look at the `emails` cube first:

<CodeTabs>

```javascript
cube(`emails`, {
  sql_table: `emails`,

  measures: {
    count: {
      type: `count`,
    },
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true,
    },

    campaign_name: {
      sql: `campaign_name`,
      type: `string`,
    },

    campaign_id: {
      sql: `campaign_id`,
      type: `number`,
    },
  },
});
```

```yaml
cubes:
  - name: emails
    sql_table: emails

    measures:
      - name: count
        type: count

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: campaign_name
        sql: campaign_name
        type: string

      - name: campaign_id
        sql: campaign_id
        type: number
```

</CodeTabs>

We can extract campaigns data into a virtual `campaigns` cube:

<CodeTabs>

```javascript
cube(`campaigns`, {
  sql: `
    SELECT
      campaign_id,
      campaign_name,
      customer_name,
      MIN(created_at) AS started_at
    FROM emails
    GROUP BY 1, 2, 3
  `,

  measures: {
    count: {
      type: `count`,
    },
  },

  dimensions: {
    id: {
      sql: `campaign_id`,
      type: `string`,
      primary_key: true,
    },

    name: {
      sql: `campaign_name`,
      type: `string`,
    },
  },
});
```

```yaml
cubes:
  - name: campaigns
    sql: >
      SELECT
        campaign_id,
        campaign_name,
        customer_name,
        MIN(created_at) AS started_at
      FROM emails GROUP BY 1, 2, 3

    measures:
      - name: count
        type: count

    dimensions:
      - name: id
        sql: campaign_id
        type: string
        primary_key: true

      - name: name
        sql: campaign_name
        type: string
```

</CodeTabs>

The following diagram shows our data model with the `Campaigns` cube:

<Diagram
  alt="Many-to-Many Entity Diagram for emails, campaigns and transactions"
  src="https://ucarecdn.com/9803e369-cb00-4e00-9c61-1a3d3e90873e/"
/>

The last piece is to finally declare a many-to-many relationship. This should be
done by declaring a [`one_to_many`
relationship][ref-schema-ref-joins-relationship] on the associative cube,
`campaigns` in our case.

<CodeTabs>

```javascript
cube(`emails`, {
  sql_table: `emails`,

  joins: {
    campaigns: {
      relationship: `many_to_one`,
      sql: `${CUBE}.campaign_id = ${campaigns.id}
      AND ${CUBE}.customer_name = ${campaigns.customer_name}`,
    },
  },

  measures: {
    count: {
      type: `count`,
    },
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true,
    },

    campaign_name: {
      sql: `campaign_name`,
      type: `string`,
    },

    campaign_id: {
      sql: `campaign_id`,
      type: `number`,
    },
  },
});

cube(`campaigns`, {
  joins: {
    transactions: {
      relationship: `one_to_many`,
      sql: `${CUBE}.customer_name = ${transactions.customer_name}
      AND ${CUBE}.campaign_id = ${transactions.campaign_id}`,
    },
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true,
    },

    customer_name: {
      sql: `customer_name`,
      type: `string`,
    },
  },
});
```

```yaml
cubes:
  - name: emails
    sql_table: emails

    joins:
      - name: campaigns
        relationship: many_to_one
        sql: >
          {CUBE}.campaign_id = {campaigns.id} AND {CUBE}.customer_name =
          {campaigns.customer_name}

    measures:
      - name: count
        type: count

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: campaign_name
        sql: campaign_name
        type: string

      - name: campaign_id
        sql: campaign_id
        type: number

  - name: campaigns

    joins:
      - name: transactions
        relationship: one_to_many
        sql: >
          {CUBE}.customer_name = {transactions.customer_name} AND
          {CUBE}.campaign_id = {transactions.campaign_id}

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: customer_name
        sql: customer_name
        type: string
```

</CodeTabs>

## Directions of joins

The direction of [joins][ref-schema-ref-joins] greatly influences the final
result set. It can be explicitly controlled on a [view][ref-schema-ref-view]
level.

As an example, let's take two cubes, `orders` and `customers`:

<CodeTabs>

```javascript
cube(`orders`, {
  sql_table: `orders`,

  measures: {
    count: {
      sql: `id`,
      type: `count`,
    },
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true,
    },

    customer_id: {
      sql: `customer_id`,
      type: `number`,
    },
  },
});

cube(`customers`, {
  sql_table: `customers`,

  measures: {
    count: {
      sql: `id`,
      type: `count`,
    },

    total_revenue: {
      sql: `revenue`,
      type: `sum`,
    },
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true,
    },
  },
});
```

```yaml
cubes:
  - name: orders
    sql_table: orders

    measures:
      - name: count
        sql: id
        type: count

      - name: total_revenue
        sql: revenue
        type: sum

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: customer_id
        sql: customer_id
        type: number

  - name: customers
    sql_table: customers

    measures:
      - name: count
        sql: id
        type: count

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true
```

</CodeTabs>

With the given data model, we have two valid analytics use cases that require
different join directions.

**The first case is to calculate the total revenue per customer.** To do this,
we'll use the `total_revenue` measure that is defined on the orders cube. We
need to be aware that orders can be placed without customer registration
(anonymous customers/guest checkouts). Therefore, we should start the join from
the `orders` cube onto the `customers` cube to ensure that we do not lose data
from anonymous orders.

<CodeTabs>

```yaml
views:
  - name: total_revenue_per_customer
    description: Total revenue per customer

    cubes:
      - join_path: orders
        includes:
          - total_revenue
          - created_at

      - join_path: orders.customers
        includes:
          - company
```

```javascript
view(`total_revenue_per_customer`, {});
```

</CodeTabs>

We can query this view as follows:

```json
{
  "dimensions": ["total_revenue_per_customer.company"],
  "measures": ["total_revenue_per_customer.total_revenue"],
  "timeDimensions": [
    {
      "dimension": "total_revenue_per_customer.created_at"
    }
  ]
}
```

**The second case is to find customers who have not placed any orders.** We will
use the count measure on the customers cube for that. In this case, we should
join the customers cube with the orders cube to find customers with zero orders
placed. The reverse order of joins would result in a dataset without data for
customers with no orders.

<CodeTabs>

```yaml
views:
  - name: customers_without_orders
    description: Customers without orders

    cubes:
      - join_path: customers
        includes:
          - company

      - join_path: customers.orders
        prefix: true
        includes:
          - created_at
          - count
```

```javascript

```

</CodeTabs>

We can then query the cube as follows:

```json
{
  "dimensions": ["customers_without_orders.company"],
  "timeDimensions": [
    {
      "dimension": "customers_without_orders.orders_created_at"
    }
  ],
  "filters": [
    {
      "member": "customers_without_orders.orders_count",
      "operator": "equals",
      "values": ["0"]
    }
  ]
}
```

## Transitive join pitfalls

Let's consider an example where we have a many-to-many relationship between
`users` and `companies` through the `companies_to_users` cube:

<CodeTabs>

```javascript
cube(`users`, {
  sql: `
    SELECT 1 AS id, 'Ali' AS name UNION ALL
    SELECT 2 AS id, 'Bob' AS name UNION ALL
    SELECT 3 AS id, 'Eve' AS name
  `,

  measures: {
    count: {
      type: `count`
    }
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `string`,
      primary_key: true
    }
  }
})

cube(`companies`, {
  sql: `
    SELECT 11 AS id, 'Acme Corporation' AS name UNION ALL
    SELECT 12 AS id, 'Stark Industries' AS name
  `,

  dimensions: {
    id: {
      sql: `id`,
      type: `string`,
      primary_key: true
    },

    name: {
      sql: `name`,
      type: `string`
    }
  }
})

cube(`companies_to_users`, {
  sql: `
    SELECT 11 AS company_id, 1 AS user_id UNION ALL
    SELECT 11 AS company_id, 2 AS user_id UNION ALL
    SELECT 12 AS company_id, 3 AS user_id
  `,

  joins: {
    users: {
      sql: `${CUBE}.user_id = ${users.id}`,
      relationship: `one_to_many`
    },

    companies: {
      sql: `${CUBE}.company_id = ${companies.id}`,
      relationship: `one_to_many`
    }
  },

  dimensions: {
    id: {
      // Joins require a primary key, so we'll create one on-the-fly
      sql: `CONCAT(${CUBE}.user_id, ':', ${CUBE}.company_id)`,
      type: `string`,
      primary_key: true
    }
  }
})
```

```yaml
cubes:
  - name: users
    sql: >
      SELECT 1 AS id, 'Ali' AS name UNION ALL
      SELECT 2 AS id, 'Bob' AS name UNION ALL
      SELECT 3 AS id, 'Eve' AS name
 
    measures:
      - name: count
        type: count
 
    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true
 
  - name: companies
    sql: >
      SELECT 11 AS id, 'Acme Corporation' AS name UNION ALL
      SELECT 12 AS id, 'Stark Industries' AS name
 
    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true
 
      - name: name
        sql: name
        type: string
 
  - name: companies_to_users
    sql: >
      SELECT 11 AS company_id, 1 AS user_id UNION ALL
      SELECT 11 AS company_id, 2 AS user_id UNION ALL
      SELECT 12 AS company_id, 3 AS user_id
 
    joins:
      - name: users
        sql: "{CUBE}.user_id = {users.id}"
        relationship: one_to_many
 
      - name: companies
        sql: "{CUBE}.company_id = {companies.id}"
        relationship: one_to_many
 
    dimensions:
      - name: id
        # Joins require a primary key, so we'll create one on-the-fly
        sql: "CONCAT({CUBE}.user_id, ':', {CUBE}.company_id)"
        type: string
        primary_key: true
```

</CodeTabs>

With this data model, querying `users.count` as a measure and `companies.name`
as a dimension would yield the following error: `Can't find join path to join
'users', 'companies'`.

The root cause is that joins are [directed](#direction-of-joins) and there's no
join path that goes *by join definitions in the data model* from `users` to
`companies` or in the opposite direction.

In [Data Graph][ref-data-graph], you can see that both `users` and `companies`
are to the right of `companies_to_users`, meaning that there's no way to go
from `users` to `companies` moving left to right or right to left:

<Diagram src="https://ucarecdn.com/c7374e2c-7359-4c13-adee-74cb32623dc6/" />

One possible solution is to move one of two joins from the `companies_to_users`
cube to either `users` or `companies` cube. Please note that it would affect
the query semantics and thus the final result:

<CodeTabs>

```javascript
cube(`users`, {
  joins: {
    companies_to_users: {
      sql: `${CUBE}.id = ${companies_to_users}.user_id`,
      relationship: `one_to_many`
    }
  }

  // ...
})

cube(`companies_to_users`, {
  joins: {
    // users: {
    //   sql: `${CUBE}.user_id = ${users.id}`,
    //   relationship: `one_to_many`
    // },

    companies: {
      sql: `${CUBE}.company_id = ${companies.id}`,
      relationship: `one_to_many`
    }
  }

  // ...
})
```

```yaml
cubes:
  - name: users

    joins:
      - name: companies_to_users
        sql: "{CUBE}.id = {companies_to_users}.user_id"
        relationship: one_to_many

    # ...

  - name: companies_to_users

    joins:
      # - name: users
      #   sql: "{CUBE}.user_id = {users.id}"
      #   relationship: one_to_many
 
      - name: companies
        sql: "{CUBE}.company_id = {companies.id}"
        relationship: one_to_many

    # ...
```

</CodeTabs>

Now there's a join path that goes *by join definitions in the data model* from
`users` to `companies_to_users` to `companies`. [Data Graph][ref-data-graph]
also shows that you can reach `companies` from `cubes` by going left to right.

<Diagram src="https://ucarecdn.com/0736a570-c1c2-45b5-a806-e90a216ce202/" />

[ref-schema-ref-view]: /reference/data-model/view
[ref-schema-ref-joins]: /reference/data-model/joins
[ref-schema-ref-joins-relationship]: /reference/data-model/joins#relationship
[self-many-to-many-no-assoc-table]: #many-to-many-relationship-without-an-associative-table
[self-join-direction]: /product/data-modeling/concepts/working-with-joins#directions-of-joins
[ref-data-graph]: /product/workspace/data-model#data-graph